XML Document Physical Structure


While the logical structure of an XML document refers to its conceptual organization, the physical structure describes where the bits are actually stored. The physical structure is composed of all the content used in the document. Storage units called entities can be part of the document or external to the document. Each entity has a unique name and its own content, from a single character inside the document to a large file existing outside the document.

In terms of an XML document's logical structure, entities are declared in the prolog and referenced in the document element. When an XML processor sees an entity reference, it reads the entity's name and replaces the reference with the actual text, graphic, or other media being referred to. An entity reference tells the processor to retrieve the content of the entity, as declared in the entity declaration, and use it in the document.

An entity can be either parsed or unparsed. Sometimes called a text entity, a parsed entity contains text data that becomes part of the XML document after processing. When the replacement text for an entity reference is substituted at parse time, the result must be well-formed XML. The contents of an unparsed entity may or may not be text. If text, the content is not parsable XML.

An example of an entity declaration follows:

<!ENTITY trademark "XML Instanceā„¢">

The trademark phrase can be plugged in anywhere it is referenced in the document, as shown below:

&trademark;

Copyright 2000 Extensibility, Inc.

Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516